Level: Intermediate Pre-requisite Skills: Python
As a Cafe, Restaurant or Bar, I am looking for commercial space in the City of Melbourne where I can open a new venue or extend my existing venue.
I would like to know where similar businesses are located and the density of residents and office workers in comparison.
I want to know the number of seats I should provide based on seating capacity at other similar establishments in the same area and understand how this may have changed over time.
If on street or off-street parking is available nearby this may help me decide where to locate my venue.
Interactive Visualisation: Melbourne Business Locator Tool
This interactive visualisation uses CLUE data to let users explore opportunities for locating your hospitality business in the City of Melbourne.
Open Data Insights:A brief introduction to CLUE data
The City of Melbourne conducts a bi-annual comprehensive survey of its residents and businesses called the "Census of Land Use and Employment (CLUE)". CLUE captures key information on land use, employment, and economic activity across the City of Melbourne.
CLUE datasets are a valuable tool for businesses looking to invest in the City of Melbourne and researchers wanting to understand those factors that influence and shape the dynamics of Australia's second largest metropolis and one of the world's most liveable cities.
CLUE data assists the City of Melbourne's business planning, policy development and strategic decision making. Investors, consultants, students, urban researchers, property analysts, businesses and developers can take advantage of CLUE to understand customers, the marketplace and the changing form and nature of the city.
Source: CLUE ( https://data.melbourne.vic.gov.au/stories/s/CLUE/rt3z-vy3t?src=hdr )
This use case makes extensive use of various CLUE datasets to illustrate the value to Data Scientists, Researchers and Software Developers.
CLUE Data is often coded to a specific location (Latitude and Longitude) and/or to a City precinct, referred to as the "CLUE small area". Datasets may also include the individual city block within a precinct referred to by its CLUE Block ID.
The geospatial coordinates describing these areas as polygons can be downloaded in GeoJSON format and used to show shaded areas on a map, known as a choropleth map. This can be a useful technique for illustrating broad trends or statistics for a city area rather than a specific location.
A map visualisation of CLUE Blocks and small areas can be found at the following links:
Data Quality Analysis: Which CLUE data should I use?
To begin our exploration and use of CLUE data we shall first import the necessary libraries to support our exploratory data analysis and visualisation.
Following are core packages required for this exercise.
The sodapy package is required specifically for accessing open data from SOCRATA compliant open data web sites.
The plotly.express package lets use build interact maps using map box services.
import os
import time
from datetime import datetime
import numpy as np
import pandas as pd
from sodapy import Socrata
import plotly.express as px
To connect to the Melbourne Open Data Portal we must establish a connection using the sodapy library by specifying a domain, being the website domain where the data is hosted, and an application access token which can be requested from the City of Melbourne Open Data portal by registering here (https://data.melbourne.vic.gov.au/signup).
For this exercise we will login anonymously.
apptoken = os.environ.get("SODAPY_APPTOKEN") # Anonymous app token
domain = "data.melbourne.vic.gov.au"
client = Socrata(domain, apptoken) # Open Dataset connection
WARNING:root:Requests made without an app_token will be subject to strict throttling limits.
Our data requirements from this use case include the following:
For this exercise, we shall start by examining the Residential Dwelling dataset.
Let's retrieve this dataset into a dataframe and inspect the first three rows.
dataresource = client.get_all('dyqx-cfn5')
dataset = pd.DataFrame.from_dict(dataresource)
print(f'The shape of dataset is {dataset.shape}.')
print('Below are the first 3 rows of this dataset:')
dataset.head(3).T
The shape of dataset is (3236, 13). Below are the first 3 rows of this dataset:
| 0 | 1 | 2 | |
|---|---|---|---|
| census_year | 2020 | 2020 | 2020 |
| block_id | 1 | 1 | 1 |
| property_id | 611394 | 611394 | 611394 |
| base_property_id | 611394 | 611394 | 611394 |
| street_address | 545-557 Flinders Street MELBOURNE VIC 3000 | 545-557 Flinders Street MELBOURNE VIC 3000 | 545-557 Flinders Street MELBOURNE VIC 3000 |
| clue_small_area | Melbourne (CBD) | Melbourne (CBD) | Melbourne (CBD) |
| trading_name | 551 Flinders Street MELBOURNE VIC 3000 | 551 Flinders Street MELBOURNE VIC 3000 | 553 Flinders Street MELBOURNE VIC 3000 |
| industry_anzsic4_code | 4511 | 4511 | 4512 |
| industry_anzsic4_description | Cafes and Restaurants | Cafes and Restaurants | Takeaway Food Services |
| seating_type | Seats - Indoor | Seats - Outdoor | Seats - Indoor |
| number_of_seats | 60 | 6 | 12 |
| x_coordinate | 144.9565145 | 144.9565145 | 144.9565145 |
| y_coordinate | -37.82097941 | -37.82097941 | -37.82097941 |
#######################################################
# Cast datatypes to enable aggregation
integer_columns = ['census_year', 'block_id', 'property_id', 'base_property_id', 'industry_anzsic4_code', 'number_of_seats']
fp_columns = ['x_coordinate', 'y_coordinate']
dataset[integer_columns] = dataset[integer_columns].astype(int)
dataset[fp_columns] = dataset[fp_columns].astype(float)
dataset = dataset.convert_dtypes() # convert remaining to string
#######################################################
# Aggregate data
groupbyfields = ['clue_small_area','block_id','y_coordinate','x_coordinate']
aggregatebyfields = {'number_of_seats': ["sum"]}
seatsByLocn = pd.DataFrame(dataset.groupby(groupbyfields, as_index=False).agg(aggregatebyfields))
seatsByLocn.columns = seatsByLocn.columns.map(''.join) # flatten column header
seatsByLocn.rename(columns={'clue_small_area': 'clue_area'}, inplace=True) #rename to match GeoJSON extract
seatsByLocn.rename(columns={'number_of_seatssum': 'number_of_seats'}, inplace=True) #rename to match GeoJSON extract
seatsByLocn['number_of_seats'] = seatsByLocn['number_of_seats'].astype(int)
seatsByLocn.head(5)
| clue_area | block_id | y_coordinate | x_coordinate | number_of_seats | |
|---|---|---|---|---|---|
| 0 | Carlton | 203 | -37.796707 | 144.965534 | 51 |
| 1 | Carlton | 203 | -37.796680 | 144.964900 | 42 |
| 2 | Carlton | 204 | -37.797833 | 144.965174 | 50 |
| 3 | Carlton | 204 | -37.797255 | 144.965754 | 120 |
| 4 | Carlton | 205 | -37.799470 | 144.964893 | 96 |
# Plot Venue seating using Scatter Map
fig = px.scatter_mapbox(seatsByLocn, lat="y_coordinate", lon="x_coordinate", size="number_of_seats",
mapbox_style="stamen-toner",
zoom=12.5,
center = {"lat": -37.813, "lon": 144.945},
opacity=0.70,
hover_name="clue_area",
hover_data={"block_id":False,"number_of_seats":True,"x_coordinate":True,"y_coordinate":True},
color_discrete_sequence=['red'],
labels={'number_of_seats':'Number of Seats', 'y_coordinate':'Lat','x_coordinate':'Long'},
width=950, height=800)
fig.show()
Load the CLUE Blocks in GeoJSON format and verify the location keys.
from urllib.request import urlopen
import json
geoJSON_Id = 'aia8-ryiq' # Melbourne CLUE Block polygons in GeoJSON format
GeoJSONURL = 'https://'+domain+'/api/geospatial/'+geoJSON_Id+'?method=export&format=GeoJSON'
with urlopen(GeoJSONURL) as response:
block = json.load(response)
block["features"][0]['properties'].keys()
dict_keys(['block_id', 'clue_area'])
# Get residential dwelling density
dataresource2 = client.get_all('rm92-h5tq') # Melbourne CLUE Residential Dwellings
dataset2 = pd.DataFrame.from_dict(dataresource2)
# cast datatypes
dataset2[['census_year', 'dwelling_number']] = dataset2[['census_year', 'dwelling_number']].astype(int)
dataset2[['x_coordinate', 'y_coordinate']] = dataset2[['x_coordinate', 'y_coordinate']].astype(float)
dataset2 = dataset2.convert_dtypes() # convert remaining to string
dataset2.dtypes
# create aggregate dataset
groupbyfields = ['block_id','clue_small_area']
aggregatebyfields = {'dwelling_number': ["sum"]}
dwellingsByBlock = pd.DataFrame(dataset2.groupby(groupbyfields, as_index=False).agg(aggregatebyfields))
dwellingsByBlock.columns = dwellingsByBlock.columns.map(''.join) # flatten column header
dwellingsByBlock.rename(columns={'clue_small_area': 'clue_area'}, inplace=True) #rename to match GeoJSON extract
dwellingsByBlock.rename(columns={'dwelling_numbersum': 'dwelling_count'}, inplace=True) #rename to match GeoJSON extract
dwellingsByBlock.head(10)
# Plot residential density using Choropleth Map
range_max = dwellingsByBlock['dwelling_count'].max()
fig = px.choropleth_mapbox(dwellingsByBlock, geojson=block, locations='block_id', color='dwelling_count',
color_continuous_scale=["#FFFF88", "yellow", "orange", "orange",
"orange", "darkorange", "red", "darkred"],
range_color=(0, range_max),
featureidkey="properties.block_id",
mapbox_style="stamen-toner", #"carto-positron",
zoom=12.5,
center = {"lat": -37.813, "lon": 144.945},
opacity=0.5,
hover_name='clue_area',
hover_data={'block_id':True,'dwelling_count':True},
labels={'dwelling_count':'Number of Dwellings','block_id':'CLUE Block Id'},
title='New Business Location',
width=950, height=800
)
# Plot Venue seating using Scatter Map
fig2 = px.scatter_mapbox(seatsByLocn, lat="y_coordinate", lon="x_coordinate", size="number_of_seats",
mapbox_style="stamen-toner",
zoom=12.5,
center = {"lat": -37.813, "lon": 144.945},
opacity=0.70,
hover_name="clue_area",
hover_data={"block_id":False,"number_of_seats":True,"x_coordinate":True,"y_coordinate":True},
color_discrete_sequence=['black'],
labels={'number_of_seats':'Number of Seats', 'y_coordinate':'Lat','x_coordinate':'Long'},
width=950, height=800)
fig.add_trace(fig2.data[0])
fig.update_geos(fitbounds="locations", visible=False)
fig.show()
!jupyter nbconvert usecase-newbusinesslocation.ipynb --to html
[NbConvertApp] Converting notebook usecase-newbusinesslocation.ipynb to html [NbConvertApp] Writing 4680874 bytes to usecase-newbusinesslocation.html